48 research outputs found

    Uncovering cis Regulatory Codes Using Synthetic Promoter Shuffling

    Get PDF
    Revealing the spectrum of combinatorial regulation of transcription at individual promoters is essential for understanding the complex structure of biological networks. However, the computations represented by the integration of various molecular signals at complex promoters are difficult to decipher in the absence of simple cis regulatory codes. Here we synthetically shuffle the regulatory architecture — operator sequences binding activators and repressors — of a canonical bacterial promoter. The resulting library of complex promoters allows for rapid exploration of promoter encoded logic regulation. Among all possible logic functions, NOR and ANDN promoter encoded logics predominate. A simple transcriptional cis regulatory code determines both logics, establishing a straightforward map between promoter structure and logic phenotype. The regulatory code is determined solely by the type of transcriptional regulation combinations: two repressors generate a NOR: NOT (a OR b) whereas a repressor and an activator generate an ANDN: a AND NOT b. Three-input versions of both logics, having an additional repressor as an input, are also present in the library. The resulting complex promoters cover a wide dynamic range of transcriptional strengths. Synthetic promoter shuffling represents a fast and efficient method for exploring the spectrum of complex regulatory functions that can be encoded by complex promoters. From an engineering point of view, synthetic promoter shuffling enables the experimental testing of the functional properties of complex promoters that cannot necessarily be inferred ab initio from the known properties of the individual genetic components. Synthetic promoter shuffling may provide a useful experimental tool for studying naturally occurring promoter shuffling

    A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications

    Get PDF
    Most existing methods for phylogenetic analysis involve developing an evolutionary model and then using some type of computational algorithm to perform multiple sequence alignment. There are two problems with this approach: (1) different evolutionary models can lead to different results, and (2) the computation time required for multiple alignments makes it impossible to analyse the phylogeny of a whole genome. This motivates us to create a new approach to characterize genetic sequences.To each DNA sequence, we associate a natural vector based on the distributions of nucleotides. This produces a one-to-one correspondence between the DNA sequence and its natural vector. We define the distance between two DNA sequences to be the distance between their associated natural vectors. This creates a genome space with a biological distance which makes global comparison of genomes with same topology possible. We use our proposed method to analyze the genomes of the new influenza A (H1N1) virus, human rhinoviruses (HRV) and mammalian mitochondrial. The result shows that a triple-reassortant swine virus circulating in North America and the Eurasian swine virus belong to the lineage of the influenza A (H1N1) virus. For the HRV and mammalian mitochondrial genomes, the results coincide with biologists' analyses.Our approach provides a powerful new tool for analyzing and annotating genomes and their phylogenetic relationships. Whole or partial genomes can be handled more easily and more quickly than using multiple alignment methods. Once a genome space has been constructed, it can be stored in a database. There is no need to reconstruct the genome space for subsequent applications, whereas in multiple alignment methods, realignment is needed to add new sequences. Furthermore, one can make a global comparison of all genomes simultaneously, which no other existing method can achieve

    Hypervariable intronic region in NCX1 is enriched in short insertion-deletion polymorphisms and showed association with cardiovascular traits

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Conserved non-coding regions (CNR) have been shown to harbor gene expression regulatory elements. Genetic variations in these regions may potentially contribute to complex disease susceptibility.</p> <p>Methods</p> <p>We targeted CNRs of cardiovascular disease (CVD) candidate gene, <it>Na(+)-Ca(2+) exchanger (NCX1) </it>with polymorphism screening among CVD patients (n = 46) using DHPLC technology. The flanking region (348 bp) of the 14 bp indel in intron 2 was further genotyped by DGGE assay in two Eastern-European CVD samples: essential hypertension (HYPEST; 470 cases, 652 controls) and coronary artery disease, CAD (CADCZ; 257 cases, controls 413). Genotype-phenotype associations were tested by regression analysis implemented in PLINK. Alignments of primate sequences were performed by ClustalW2.</p> <p>Results</p> <p>Nine of the identified <it>NCX1 </it>variants were either singletons or targeted by commercial platforms. The 14 bp intronic indel (rs11274804) was represented with substantial frequency in HYPEST (6.82%) and CADCZ (14.58%). Genotyping in Eastern-Europeans (n = 1792) revealed hypervariable nature of this locus, represented by seven alternative alleles. The alignments of human-chimpanzee-macaque sequences showed that the major human variant (allele frequency 90.45%) was actually a human-specific deletion compared to other primates. In humans, this deletion was surrounded by other short (5-43 bp) deletion variants and a duplication (40 bp) polymorphism possessing overlapping breakpoints. This indicates a potential indel hotspot, triggered by the initial deletion in human lineage. An association was detected between the carrier status of 14 bp indel ancestral allele and CAD (<it>P </it>= 0.0016, OR = 2.02; Bonferroni significance level alpha = 0.0045), but not with hypertension. The risk for the CAD development was even higher among the patients additionally diagnosed with metabolic syndrome (<it>P </it>= 0.0014, OR = 2.34). Consistent with the effect on metabolic processes, suggestive evidence for the association with heart rate, serum triglyceride and LDL levels was detected (<it>P </it>= 0.04).</p> <p>Conclusions</p> <p>Compared to SNPs targeted by large number of locus-specific and genome-wide assays, considerably less attention has been paid to short indel variants in the human genome. The data of genome dynamics, mutation rate and population genetics of short indels, as well as their impact on gene expressional profile and human disease susceptibility is limited. The characterization of <it>NCX1 </it>intronic hypervariable non-coding region enriched in human-specific indel variants contributes to this gap of knowledge.</p

    The Human Gene Mutation Database (HGMD®) and its exploitation in the study of mutational mechanisms

    No full text
    The Human Gene Mutation Database (HGMD) constitutes a comprehensive core collection of data on germ-line mutations in nuclear genes underlying or associated with human inherited disease (http://www.hgmd.org). Data cataloged include single base-pair substitutions in coding, regulatory, and splicing-relevant regions, microdeletions and microinsertions, indels, and triplet repeat expansions, as well as gross gene deletions, insertions, duplications, and complex rearrangements. Each mutation is entered into HGMD only once, in order to avoid confusion between recurrent and identical-by-descent lesions. By June 2005, the database contained in excess of 53,000 different lesions detected in 2029 different nuclear genes, with new entries currently accumulating at a rate in excess of 5000 per annum. HGMD includes cDNA reference sequences, now provided for more than 90% of the listed genes, splice junction data, disease-associated and functional polymorphisms, and links to data present in publicly available online locus-specific mutation databases

    Evolution of the proximal promoter region of the mammalian growth hormone gene

    No full text
    The evolutionary relationship between the proximal growth hormone (GH) gene promoter sequences of 12 mammalian species was explored by comparison of their trinucleotide composition and by multiple sequence alignment. Both approaches yielded results that were consistent with the known fossil record-based phylogeny of the analysed sequences, suggesting that the two methods of tree reconstruction might be equally efficient and reliable. The pattern of evolution inferred for the mammalian GH gene promoters was found to vary both temporally and spatially. Thus, two distinct regions devoid of any evolutionary changes exist in primates, but only one of these 'gaps' is also observed in rodents, and neither is seen in ruminants. Furthermore, different evolutionary rates must have prevailed during different periods of evolutionary time and in different lineages, with a dramatic increase in evolutionary rate apparent in primates. Since a similar pattern of discontinuity has been previously noted for the evolution of the GH-coding regions, it may reflect the action of positive selection operating upon the GH gene as a single cohesive unit. Strong evidence for the action of gene conversion between primate GH gene promoters is provided by the fact that the human GH1 and GH2 sequences, which are thought to have diverged before the divergence of Old World monkeys from great apes, are more similar to one another than either is to the rhesus monkey GH2 promoter. Finally, it was noted that a number of nucleotide positions in the GH1 gene promoter that are polymorphic in humans appear to be highly conserved in mammals. This apparent conundrum, which could represent a caveat for the interpretation of phylogenetic footprinting studies, is potentially explicable in terms either of reduced genetic diversity in highly inbred animal species or insufficient population data from non-human species
    corecore